About the Data
All data is from American Community Survey 2010-2012 Public Use Microdata Series. The data contains 5 files segregated based on level of education and age
As my focus is towards answering questions primarily aligned with graduate degree vs non-graduate degree, I plan to make use of the files majors-list.csv, recent-grads.csv and grad-students.csv to begin with and bring in others when required.
majors_list description
| Header | Description |
|---|---|
| FOD1P | Recoded field of degree - first entry |
| Major_code | Major code, FO1DP in ACS PUMS |
| Major | Major description |
recent_grads description
| Header | Description |
|---|---|
| Rank | Rank by median earnings |
| Major_code | Major code, FO1DP in ACS PUMS |
| Major | Major description |
| Major_category | Category of major from Carnevale et al |
| Total | Total number of people with major |
| Sample_size | Sample size (unweighted) of full-time, year-round ONLY (used for earnings) |
| Men | Male graduates |
| Women | Female graduates |
| ShareWomen | Women as share of total |
| Employed | Number employed (ESR == 1 or 2) |
| Full_time | Employed 35 hours or more |
| Part_time | Employed less than 35 hours |
| Full_time_year_round | Employed at least 50 weeks (WKW == 1) and at least 35 hours (WKHP >= 35) |
| Unemployed | Number unemployed (ESR == 3) |
| Unemployment_rate | Unemployed / (Unemployed + Employed) |
| Median | Median earnings of full-time, year-round workers |
| P25th | 25th percentile of earnings |
| P75th | 75th percentile of earnings |
| College_jobs | Number with job requiring a college degree |
| Non_college_jobs | Number with job not requiring a college degree |
| Low_wage_jobs | Number in low-wage service jobs |
grad_students description
| Header | Description |
|---|---|
| Major_code | Major code, FO1DP in ACS PUMS |
| Major | Major description |
| Major_category | Category of major from Carnevale et al |
| Grad_total | Total number of graduate students with major |
| Grad_sample_size | Graduate students sample size (unweighted) of full-time, year-round ONLY (used for earnings) |
| Grad_employed | Number of graduate students employed (ESR == 1 or 2) |
| Grad_full_time_year_round | Employed at least 50 weeks (WKW == 1) and at least 35 hours (WKHP >= 35) |
| Grad_unemployed | Number of graduate students unemployed (ESR == 3) |
| Grad_unemployment_rate | Graduate students Unemployed / (Unemployed + Employed) |
| Grad_median | Median earnings of Graduate full-time, year-round workers |
| Grad_P25th | 25th percentile of graduate earnings |
| Grad_P75th | 75th percentile of graduate earnings |
| Nongrad_total | Total number of graduate students with major |
| Nongrad_employed | Number of graduate students employed (ESR == 1 or 2) |
| Nongrad_full_time_year_round | Employed at least 50 weeks (WKW == 1) and at least 35 hours (WKHP >= 35) |
| Nongrad_unemployed | Number of graduate students unemployed (ESR == 3) |
| Nongrad_unemployment_rate | Graduate students Unemployed / (Unemployed + Employed) |
| Nongrad_median | Median earnings of Graduate full-time, year-round workers |
| Nongrad_P25th | 25th percentile of graduate earnings |
| Nongrad_P75th | 75th percentile of graduate earnings |
Why the dataset?
I found the data interesting and wanted to understand and answer what effect does graduate degree and the major of graduate degree have on employment and the pay
library(tidyverse)
library(naniar)
grad_students <- read.csv("./grad-students.csv")
majors_list <- read.csv("./majors-list.csv")
recent_grads <- read.csv("./recent-grads.csv")
#convert df to tibble
grad_students <- as_tibble(grad_students)
majors_list <- as_tibble(majors_list)
recent_grads <- as_tibble(recent_grads)
vis_miss(grad_students)
vis_miss(majors_list)
vis_miss(recent_grads)
ggplot(recent_grads, aes(x = Median, y = Major_category), na.rm =TRUE) +
geom_boxplot(width = 0.4, fill = "white") +
geom_jitter(aes(color = Major_category),
width = 0.1, size = 0.5) + labs(y = "Major Category", x ="Median income people pursuing in each major")
options(scipen=999)
ggplot(recent_grads, aes(x=Employed, fill=Major_category)) +
geom_histogram( color="#e9ecef", alpha=0.6, position = 'identity', bins=10) +
labs(fill="")
ggplot(recent_grads, aes(x=Median, colour = Major_category)) +
geom_freqpoly(binwidth = 10000) + scale_fill_brewer(palette = "Paired")
ggplot(grad_students, aes(x = Grad_median, y = Major_category), na.rm =TRUE) +
geom_boxplot(width = 0.4, fill = "white") +
geom_jitter(aes(color = Major_category ),
width = 0.1, size = 0.5) + labs(y = "Major Category", x ="Income based on graduate major")
ggplot(grad_students, aes(x = Nongrad_median, y = Major_category), na.rm =TRUE) +
geom_boxplot(width = 0.4, fill = "white") +
geom_jitter(aes(color = Major_category ),
width = 0.1, size = 0.5) + labs(y = "Major Category", x ="Income based on undergraduate major")
ggplot(grad_students, aes(x=Grad_employed, fill=Major_category)) +
geom_histogram( color="#e9ecef", alpha=0.6, position = 'identity', bins=10)+
labs(x ="Income based on graduate major", fill="")
ggplot(grad_students, aes(x=Grad_median, colour = Major_category)) +
geom_freqpoly(binwidth = 10000)
ggplot(grad_students, aes(x=Nongrad_employed, fill=Major_category)) +
geom_histogram( color="#e9ecef", alpha=0.6, position = 'identity', bins=10) +
labs(x ="Income based on undergraduate major", fill="")
ggplot(grad_students, aes(x=Nongrad_median, colour = Major_category)) +
geom_freqpoly(binwidth = 10000)
ggplot(recent_grads, aes(x = Major_category, y= Median, fill = Men)) +
geom_bar(stat = "identity", position = "dodge") + theme(axis.text.x = element_text(angle = 60, hjust = 1))
ggplot(recent_grads, aes(x = Major_category, y= Median, fill = Women)) +
geom_bar(stat = "identity", position = "dodge") + theme(axis.text.x = element_text(angle = 60, hjust = 1))
#Number of men and women in each Major category
ggplot(recent_grads, aes(Men , Women)) +
geom_point() +
stat_smooth() +
facet_wrap(~Major_category)
#Grad and Nongrad median salary across all major categories
ggplot(grad_students, aes(Grad_median , Nongrad_median)) +
geom_point() +
stat_smooth() +
facet_wrap(~Major_category)
major_categories_lst <- unique(majors_list$Major_Category)
for (major_cat in major_categories_lst){
if (!is.na(major_cat)){
filtered_data <- filter(recent_grads, Major_category == major_cat)
print(ggplot(filtered_data, aes(x=Median, fill=Major)) +
geom_histogram( color="#e9ecef", alpha=0.6, position = 'identity', bins=20) +
labs(x="Median Salary", fill="", title= major_cat))
}
}
for (major_cat in major_categories_lst){
if (!is.na(major_cat)){
filtered_data <- filter(grad_students, Major_category == major_cat)
print(ggplot(filtered_data, aes(x=Grad_median, fill=Major)) +
geom_histogram( color="#e9ecef", alpha=0.6, position = 'identity', bins=20) +
labs(x="Median Salary", fill="", title= major_cat))
}
}